sanity checks

Terms from Artificial Intelligence: humans at the heart of algorithms

Raw data may have errors, mistypings or other problems. Sanity checks are small, usualy realtively simple, pieces of code run across a dataset to verify simple properties. For example whether items indexed in a book all appear in the glossary, or whether a field that is intended to be a number indeed only contains numbers. This is often a critical part of data cleaning (or data wrangling) In addition, when dealing with large quantities of data it can be hard to be sure that transformations, which appear sensible in code, actually do what was intended as only a small portion of the data can be inspected by hand. Sanity checks can again be applied after a transformation stage to help build confidence. Sanity checs often involve simple regular expression matching (e.g. to verify the number field) or looking up values (e.g. to check glossary items are defined).

Used on pages 199, 200, 214